We consider the problem of identifying patterns in a data set that exhibitanomalous behavior, often referred to as anomaly detection. Similarity-basedanomaly detection algorithms detect abnormally large amounts of similarity ordissimilarity, e.g.~as measured by nearest neighbor Euclidean distances betweena test sample and the training samples. In many application domains there maynot exist a single dissimilarity measure that captures all possible anomalouspatterns. In such cases, multiple dissimilarity measures can be defined,including non-metric measures, and one can test for anomalies by scalarizingusing a non-negative linear combination of them. If the relative importance ofthe different dissimilarity measures are not known in advance, as in manyanomaly detection applications, the anomaly detection algorithm may need to beexecuted multiple times with different choices of weights in the linearcombination. In this paper, we propose a method for similarity-based anomalydetection using a novel multi-criteria dissimilarity measure, the Pareto depth.The proposed Pareto depth analysis (PDA) anomaly detection algorithm uses theconcept of Pareto optimality to detect anomalies under multiple criteriawithout having to run an algorithm multiple times with different choices ofweights. The proposed PDA approach is provably better than using linearcombinations of the criteria and shows superior performance on experiments withsynthetic and real data sets.
展开▼